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1 Sequence similarity search and access methods: On the efficient 96% 
Qj evaluation of relaxed queries in biological databases 

Yangjun Chen , Duren Che , Karl Aberer 

Proceedings of the eleventh international conference on Information and 

knowledge management November 2002 

In this paper, a new technique is developed to support the query relaxation in 
biological databases. Query relaxation is required due to the fact that queries tend not 
to be expressed exactly by the users, especially in scientific databases such as 
biological databases, in which complex domain knowledge is heavily involved. To treat 
this problem, we propose the concept of the so-called fuzzy equivalence classes to 
capture important kinds of domain knowledge that is used to relax queries. This co ... 

2 Reports from KDD-2001: KDD Cup 2001 report 87% 
Jie Cheng , Christos Hatzis , Hisashi Hayashi , Mark-A. Krogel , Shinichi Morishita , David 

— Page , Jun Sese 

ACM SIGKDD Explorations Newsletter January 2002 
Volume 3 Issue 2 

This paper presents results and lessons from KDD Cup 2001. KDD Cup 2001 focused 
on mining biological databases. It involved three cutting-edge tasks related to drug 
design and genomics. 

3 Bioinformatics: BIOMIND-protein property prediction by property 85% 
2) proximity profiles 

Deendayal Dinakarpandian , Vijay Kumar 

Proceedings f the 2002 ACM symp sium n Applied c mputing March 2002 
We present the infrastructure of a bioinformation system called BIOMIND, which 
exploits the close relationship between the structural and functional properties of 
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proteins. The schem^^esented here views proteins as comp^fe entities with 
structural and functional properties, and searches are based on distances along each 
property axis. Explicitly, this allows one to frame complex queries using quantitative 
criteria that confer more discerning power than systems based on a text-m ... 



4 Poster session: A system for knowledge management in bioinformatics 84% 

Sudeshna Adak , Vishal S. Batra , Deo N. Bhardwaj , P. V. Kamesam , Pankaj Kankar , 
— Manish P. Kurhekar , Biplav Srivastava 

Proceedings of the eleventh international conference on Information and 
knowledge management November 2002 

The emerging biochip technology has made it possible to simultaneously study 
expression (activity level) of thousands of genes or proteins in a single experiment in 
the laboratory. However, in order to extract relevant biological knowledge from the 
biochip experimental data, it is critical not only to analyze the experimental data, but 
also to cross-reference and correlate these large volumes of data with information 
available in external biological databases accessible online. We address this p ... 



5 Background and overview for KDD Cup 2002 task 1: information 82% 
2) extraction from biomedical articles 

Alexander Yeh , Lynette Hirschman , Alexander Morgan 
ACM SIGKDD Explorations Newsletter December 2002 
Volume 4 Issue 2 

This paper presents a background and overview for task 1 (of 2 tasks) of the KDD 
Challenge Cup 2002, a competition held in conjunction with the ACM SIGKDD 
International Conference on Knowledge Discovery and Data Mining (KDD), July 23—26, 
2002. Task 1 dealt with detecting which papers, in a set of fruitfly genetics papers 
(texts), contained experimental results about gene products (transcripts and proteins), 
and also within each paper, which genes had experimental results about their products 
me ... 



6 Facilitating transformations in a human genome project database 82% 

S. B. Davidson , A. S. Kosky , B. Eckman 

Proceedings of the third international conference on Information and knowledge 

management November 1994 

Human Genome Project databases present a confluence of interesting database 
challenges: rapid schema and data evolution, complex data entry and constraint 
management, and the need to integrate multiple data sources and software systems 
which range over a wide variety of models and formats. While these challenges are not 
necessarily unique to biological databases, their combination, intensity and complexity 
are unusual and make automated solutions imperative. We illustrate these problems 
in ... 



7 Panels: Biodiversity and biocomplexity informatics: policy and 80% 
implementation science versus citizen science 

P. Bryan Heidorn 

Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries July 
2002 

Biological science is one of the top ten social trends and the twenty-first Century has 
been defined as "The Age of Biology" [1]. One of the central themes of this age is 
biodiversity. Biodiversity is the richness of life. Biodiversity includes the variety of 
genes within one species through the complex interconnection of all life within an 
environment. One of the grand challenges of the twenty-first century is to document 
and understand the world's natural heritage. The management of the many k ... 
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8 Lab report special section: information retrieval research in the 80% 

University of Sheffield 
Peter Willett 

ACM SIGIR F rum December 1997 
Volume 31 Issue 2 



9 Analogical reasoning for knowledge discovery in a molecular biology 80% 

2) database 

Juergen Haas , Jeffrey S. Aaronson , G. Christian Overton 

Proceedings of the second international conference on Information and 

knowledge management December 1993 



10 The human genome project and informatics 80% 

Karen A. Frenkel 



Communications of the ACM November 1991 
Volume 34 Issue 11 



11 Location-based services and mobile computing: algorithms: A road 77% 
Q| network embedding technique for k-nearest neighbor search in moving 
object databases 

Cyrus Shahabi , Mohammad R. Kolahdouzan , Mehdi Sharifzadeh 
Proceedings of the tenth ACM international symposium on Advances in 
geographic information systems November 2002 

A very important class of queries in GIS applications is the class of K-Nearest Neighbor 
queries. Most of the current studies on the K-Nearest Neighbor queries utilize spatial 
index structures and hence are based on the Euclidean distances between the points. 
In real-world road networks, however, the shortest distance between two points 
depends on the actual path connecting the points and cannot be computed accurately 
using one of the Minkowski metrics. Thus, the Euclidean distance may no ... 



12 Algorithms on Stings, Trees, and Sequences: Computer Science and 77% 
2) Computational Biology 

Dan Gusfield 

ACM SIGACT News December 1997 
Volume 28 Issue 4 



13 Invited papers: Mining the human genome using virtual reality 77% 

Bram Stolk , Faizal Abdoelrahman , Anton Koning , Paul Wielinga , Jean-Marc Neefs , 

Andrew Stubbs , An de Bondt , Peter Leemans , Peter van der Spek 

Proceedings of the Fourth Eurographics Workshop on Parallel Graphics and 

Visualization September 2002 

The analysis of genomic data and integration of diverse biological data sources has 
become increasingly difficult for researches in the life sciences. This problem is 
exacerbated by the speed with which new data is gathered through automated 
technology like DNA microarrays. We developed a virtual reality application for 
visualizing hierarchical relationships within a gene family and for visualizing networks 
of gene expression data. Integration of other information from multiple databases with 
th ... 
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14 Data exploration: f^Eye: visual clustering of high cmiensional data 77% 

Alexander Hinneburg , Daniel A. Keim , Markus Wawryniuk 
— Proceedings of the 2002 ACM SIGMOD international conference n Management 
f data June 2002 

Clustering of large data bases is an important research area with a large variety of 
applications in the data base context. Missing in most of the research efforts are 
means for guiding the clustering process and understanding the results, which is 
especially important for high dimensional data. Visualization technology may help to 
solve this problem since it provides effective support of different clustering paradigms 
and allows a visual inspection of the results. The HD-Eye (high-dim. e ... 



15 Research session: data warehousing and archive: Archiving scientific 77% 
H) data 

Peter Buneman , Sanjeev Khanna , Keishi Tajima , Wang-Chiew Tan 

Proceedings of the 2002 ACM SIGMOD international conference on Management 

of data June 2002 

We present an archiving technique for hierarchical data with key structure. Our 
approach is based on the notion of timestamps whereby an element appearing in 
multiple versions of the database is stored only once along with a compact description 
of versions in which it appears. The basic idea of timestamping was discovered by 
Driscoll et al. in the context of persistent data structures where one wishes to track 
the sequences of changes made to a data structure. We extend this idea to deve ... 



16 General applications: Complex and interconnected systems: optimistic 77% 
2) parallel simulation of a large-scale view storage system 

Garrett Yaun , Christopher D. Carothers , Sibel Adali , David Spooner 
Proceedings of the 33nd conference on Winter simulation December 2001 

In this paper we present the design and implementation of a complex view storage 
system model that is suitable for execution on a optimistic parallel simulation engine. 
, What is unique over other optimistic systems is that reverse computation as opposed 
to state-saving is used to support the rollback mechanism. In this model, a hierarchy 
of view storage servers are connected to an array of client-side local disks. The term 
view refers to the output or result of a query made on the part of ... 



17 A proposed undergraduate bioinformatics curriculum for computer 77% 
12 scientists 

Travis Doom , Michael Raymer , Dan Krane , Oscar Garcia 

ACM SIGCSE Bulletin , Proceedings of the 33rd SIGCSE technical symposium on 
Computer science education February 2002 
Volume 34 Issue 1 

Bioinformatics is a new and rapidly evolving discipline that has emerged from the 
fields of experimental molecular biology and biochemistry, and from the the artificial 
intelligence, database, and algorithms disciplines of computer science. Largely because 
of the inherently interdisciplinary nature of bioinformatics research, academia has 
been slow to respond to strong industry and government demands for trained 
scientists to develop and apply novel bioinformatics techniques to the rapidly-growi ... 



18 Efficient algorithms for document retrieval problems 77% 
f^ft S. Muthukrishnan 

Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete 

alg rithms January 2002 

We are given a collection D of text documents dl,...,dk, with 1/ = /?, which may be 



http://portal.acm.org/resultsxfm?coll=ACM&dl=ACM&CFID=10606659& 



Results ^ Page 5 of 5 

preprocessed. In th^tocument listing problem, we are given online query 
comprising of a pattern string p of length m and our goal is to return the set of all 
documents that contain one or more copies of p. In the closely related occurrence 
listing problem, we output the set of all positions wi ... 



19 Distributed query evaluation on semistructured data 77% 
Dan Suciu 

— ACM Transactions on Database Systems (TODS) March 2002 
Volume 27 Issue 1 

Semistructured data is modeled as a rooted, labeled graph. The simplest kinds of 
queries on such data are those which traverse paths described by regular path 
expressions. More complex queries combine several regular path expressions, with 
complex data restructuring, and with sub-queries. This article addresses the problem 
of efficient query evaluation on distributed, semistructured databases. In our setting, 
the nodes of the database are distributed over a fixed number of sites, and the ... 



20 NBDL: a CIS framework for NSDL 77% 
Cft Joe Futrelle , Su-Shing Chen , Kevin C. Chang 

— Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries January 
2001 

In this paper, we describe the NBDL (National Biology Digital Library) project, one of 
the six CIS (Core Integration System) projects of the NSF NSDL (National SMETE 
Digital Library) Program. 
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1 Book reviews: Introduction to constraint databases 

a Bart Kuijpers 
as«m crruni 



77% 



ACM SIGMOD Record September 2002 
Volume 31 Issue 3 



2 Pattern discovery and forecasting: An iterative strategy for pattern 
2) discovery in high-dimensional data sets 

Chun Tang , Aidong Zhang 

Proceedings of the eleventh international conference on Information and 

knowledge management November 2002 

High-dimensional data representation in which each data item (termed target object) is 
described by many features, is a necessary component of many applications. For 
example, in DNA microarrays, each sample (target object) is represented by thousands 
of genes as features. Pattern discovery of target objects presents interesting but also 
very challenging problems. The data sets are typically not task-specific, many features 
are irrelevant or redundant and should be pruned out or filtered for the ... 



77% 



3 A cost model for query processing in high dimensional data spaces 77% 

Christian Bohm 

— ACM Transactions on Database Systems (TODS) June 2000 
Volume 25 Issue 2 

During the last decade, multimedia databases have become increasingly important in 
many application areas such as medicine, CAD, geography, and molecular biology. An 
important research topic in multimedia databases is similarity search in large data sets. 
Most current approaches that address similarity search use the feature approach, 
which transforms important properties of the stored objects into points of a high- 
dimensional space (feature vectors). Thus, similarity search is transformed ... 
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